(written 3.17.2025) This is how I imagine this report:
Next step: - These cluster assignments (and over-laps) can then be used to evaluate turn-over. - I think “turnover” will be a separate project
An idea: - A child that defines the “count” and “sum rate”. Possibly import my ppt slides? ??
Looks like I shouldn’t run anything in the master document, but run everything as children.
I’m really going to want to see where the pareto performers are.
Rate_cluster | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
1 | 0.0 | 0.0 | 41.2 | 37.4 | 21.3 |
2 | 0.0 | 0.0 | 0.3 | 52.7 | 47.0 |
3 | 0.0 | 4.2 | 0.0 | 59.5 | 36.3 |
4 | 0.0 | 8.6 | 0.0 | 42.9 | 48.6 |
5 | 0.0 | 38.4 | 0.0 | 16.6 | 45.0 |
6 | 0.0 | 44.2 | 0.0 | 25.0 | 30.8 |
7 | 0.0 | 76.4 | 0.0 | 1.2 | 22.4 |
8 | 35.3 | 62.5 | 0.0 | 0.0 | 2.2 |
Complex_cluster | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 100.0 |
2 | 0.0 | 0.0 | 2.0 | 5.3 | 14.6 | 17.4 | 31.7 | 29.0 |
3 | 98.9 | 1.1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 14.2 | 34.9 | 20.3 | 18.8 | 4.5 | 7.0 | 0.4 | 0.0 |
5 | 8.0 | 30.9 | 12.2 | 21.1 | 12.1 | 8.5 | 6.6 | 0.7 |
Low performers are in Rates Cluster 1 (<20% win rates by count and sum.)
Strike outs:
These PI’s have never won a proposal. Strike-outs are located in the overlap with Complex Cluster 3 (“Pipe Dreams”).
Sputtering:
These PI’s submit a decent number of proposals with relatively few wins.
They are located in the overlap with Complex Cluster 4 (“Plucky”).
Many at bats, few hits:
These PI’s submit a large number of proposals with relatively few wins.
They are located in the overlap with Complex Cluster 5 (“Prolific”).
These PI’s simultaneously carry less than a 20% win rate while also being a Top 20% performer. They contribute to 80% of the total funds won over the last ten years. (Maybe exclude them from the other clusters?)
Notes:
Try showing the boxplots, win and loss, for each of these sub-categories
I need to set the same scale for timeDots I need to modify timeDots according to my new and improved functions I should show the tick marks (timeLine) as well as the timeDots
What about people with a low rate of wins, but have a high total sum? At the bottom right of “Funds requested” graph?
Maybe I have two criteria for low performers – (1) Low win rate (what I have so far) (2) Low total funds requested won (who would this be? I guess I want a pareto already!)
I need to see these by “total funds requested won” per PI
I need to examine the values that have a very low win rate, but have still pulled in enough money to be a significant contributor to the U
I can color the ones in the graph above according to 80th percentile ….?
It would be really cool to replace my background gray grid with transparently changing viridis bands according to the percentiles, or the cumulative percentile or something like that.
I need to include complex cluster 2, it’s just incomplete otherwise.
And maybe we don’t call these “low performers” but we call them “wasted effort” or something like that?
I mean, how can you call someone a low performer when they are in the 90th percentile for funds brought in?
And how can you call someone a low performer when they nabbed 75% of their proposals, but are in the cumulative 10th percentile?
“Low” would be a measure of performance against your own potential, or of your peers’.
I guess I’d want to carefully designate the “low” performers that are above the 80th percentile. That’s a curious sub-set.
Using the natural breaks identified in clustering the counts of proposals per principal investigators, 1,264 principal investigators submitting three or fewer proposals are filtered out. 1,672 principal investigators submitting four or more proposals are kept for further clustering.
This shows principal investigators sorted into five clusters using multiple criteria.
First, principal investigators (PI’s) with three or fewer proposals are filtered out. 1,265 principal investigators (43%) are removed and 1,672 principal investigators (57%) are kept.
Second, several variables are calculated per principal investigator:
Third, variables except rates are scaled with a natural log transformation first, then all variables are centered and scaled.
Fourth, principle components are extracted.
Fifth, hierarchical clustering using Euclidean distance and Ward’s method is used to produce five clusters.
The clusters are named, described verbally and visually, and the cluster populations by colleges and major institutions are shown.
CLUSTER 1: PERFECT
Principal investigators in Cluster 1 are perfect in their attempts with
100% win rates of relatively smaller proposals.
A high proportion of PI’s from the School of Business are found here.
It has a population of 65 PIs (3.9%) and accounts for only 1.7% of total requested funds won.
CLUSTER 2: PRECISE
Principal investigators in Cluster 2 don’t have the perfect record of
Cluster 1, but win a high proportion of their proposals on both a count
and sum basis. They bring in as much as Cluster 4 in total funds
requested won through racking up lots of relatively smaller wins.
It has a population of 397 PIs (23.7%) and accounts for 13.3% of total requested funds won.
CLUSTER 3: PIPE DREAMS
Cluster 3 contains principal investigators with zero and near-zero win
rates. They ambitiously attempt proposals as large as the principal
investigators in Clusters 4 and 5 – just without success.
It has high proportions of PI’s from the colleges of Law, Health, and Social and Behavioral Sciences.
It has a population of 88 PIs (5.3%) and accounts for 0% of total requested funds won.
CLUSTER 4: PLUCKY
Cluster 4 contains principal investigators who mainly differ from
Cluster 5 in the count of proposals submitted. They submit equivalently
large proposals, just far fewer – less than a third of the proposals
submitted by Cluster 5. Some of this is due to longevity (PI’s in
Cluster 4 have been submitting bids for a median of 1.86 years compared
to 3.7 years in Cluster 5), but some is also due to the rate of
submission: PI’s in Cluster 5 submit a median of 30% more bids per
year.
Cluster 4 PI’s also slightly lag Cluster 5 PI’s in win rates (count and sum) as well.
It has a population of 558 PIs (33.4%) and accounts for 14.2% of total requested funds won.
CLUSTER 5: PROLIFIC
Cluster 5 contains principal investigators with a prolific record of
submission. They submit a large number of proposals – triple Cluster 4,
the next highest – with impressive win rates on large proposals. These
are the workhorses of University of Utah research.
Principal investigators from the institutions (EGI, CTSI, CVRTI, ICSE, and SCI) tend to show up here.
It has a population of 564 PIs (33.7%) and accounts for a large majority (70.8%) of total requested funds won.
cluster | population | pop_perc | count_total | count_perc | win.sum | sum_perc | win.count | win.mean | win.median | loss.sum | loss.count | loss.mean | loss.median | sum.rate | count.rate | color |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 65 | 3.9 | 400 | 2 | 84,000,000 | 2 | 400 | 208,000 | 50,000 | 0 | 0 | 0 | 0 | 1.00 | 1.00 | forestgreen |
2 | 397 | 23.7 | 3,910 | 15 | 678,000,000 | 13 | 2,920 | 232,000 | 47,000 | 210,000,000 | 1,000 | 210,700 | 68,200 | 0.76 | 0.75 | deepskyblue |
3 | 88 | 5.3 | 570 | 2 | 0 | 0 | 0 | 0 | 0 | 408,000,000 | 560 | 721,500 | 413,600 | 0.00 | 0.00 | goldenrod |
4 | 558 | 33.4 | 4,830 | 18 | 723,000,000 | 14 | 1,500 | 482,000 | 138,000 | 2,762,000,000 | 3,330 | 829,700 | 409,800 | 0.21 | 0.31 | firebrick |
5 | 564 | 33.7 | 16,960 | 64 | 3,608,000,000 | 71 | 7,230 | 499,000 | 91,000 | 9,848,000,000 | 9,740 | 1,011,600 | 417,900 | 0.27 | 0.43 | darkslategray |
Although this clustering did not include any time variables, the submission patterns by cluster are shown.
259 (15.5)% PI’s appear in multiple colleges or institutions.
Org | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
Arch | 0.0 | 41.2 | 0.0 | 47.1 | 11.8 |
Bus | 33.3 | 46.7 | 0.0 | 13.3 | 6.7 |
CTSI | 0.0 | 16.7 | 8.3 | 25.0 | 50.0 |
CVRTI | 0.0 | 11.1 | 5.6 | 22.2 | 61.1 |
Dent | 0.0 | 11.1 | 0.0 | 55.6 | 33.3 |
Educ | 6.9 | 20.7 | 10.3 | 55.2 | 6.9 |
EGI | 10.5 | 21.1 | 10.5 | 15.8 | 42.1 |
Engr | 0.4 | 6.8 | 4.3 | 34.9 | 53.6 |
FinArt | 0.0 | 44.4 | 0.0 | 44.4 | 11.1 |
Health | 3.6 | 21.4 | 14.3 | 39.3 | 21.4 |
Hum | 6.2 | 43.8 | 6.2 | 25.0 | 18.8 |
Hunt | 0.6 | 28.2 | 2.6 | 27.6 | 41.0 |
ICSE | 10.0 | 10.0 | 0.0 | 30.0 | 50.0 |
Law | 0.0 | 40.0 | 20.0 | 40.0 | 0.0 |
Med | 4.0 | 28.1 | 4.4 | 27.2 | 36.4 |
Nurs | 5.7 | 8.6 | 11.4 | 45.7 | 28.6 |
other | 6.4 | 25.6 | 2.6 | 21.8 | 43.6 |
Pharm | 0.0 | 19.4 | 3.2 | 35.5 | 41.9 |
SCI | 5.4 | 16.2 | 0.0 | 29.7 | 48.6 |
Science | 3.7 | 19.4 | 4.6 | 41.0 | 31.3 |
SocBeh | 3.8 | 20.3 | 13.9 | 41.8 | 20.3 |
SocWrk | 4.3 | 43.5 | 4.3 | 39.1 | 8.7 |
Tran | 0.0 | 50.0 | 0.0 | 50.0 | 0.0 |
Org | Arch | Bus | CTSI | CVRTI | Dent | Educ | EGI | Engr | FinArt | Health | Hum | Hunt | ICSE | Law | Med | Nurs | other | Pharm | SCI | Science | SocBeh | SocWrk | Tran |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.0 | 7.5 | 0.0 | 0.0 | 0.0 | 3.0 | 3.0 | 1.5 | 0.0 | 3.0 | 1.5 | 1.5 | 1.5 | 0.0 | 46.3 | 3.0 | 7.5 | 0.0 | 3.0 | 11.9 | 4.5 | 1.5 | 0.0 |
2 | 1.6 | 1.6 | 0.9 | 0.4 | 0.2 | 1.3 | 0.9 | 3.6 | 0.9 | 2.7 | 1.6 | 9.9 | 0.2 | 0.4 | 49.1 | 0.7 | 4.5 | 2.7 | 1.3 | 9.4 | 3.6 | 2.2 | 0.2 |
3 | 0.0 | 0.0 | 2.1 | 1.0 | 0.0 | 3.1 | 2.1 | 10.4 | 0.0 | 8.3 | 1.0 | 4.2 | 0.0 | 1.0 | 35.4 | 4.2 | 2.1 | 2.1 | 0.0 | 10.4 | 11.5 | 1.0 | 0.0 |
4 | 1.3 | 0.3 | 1.0 | 0.7 | 0.8 | 2.6 | 0.5 | 13.4 | 0.7 | 3.6 | 0.7 | 7.0 | 0.5 | 0.3 | 34.5 | 2.6 | 2.8 | 3.6 | 1.8 | 14.5 | 5.4 | 1.5 | 0.2 |
5 | 0.3 | 0.1 | 1.7 | 1.6 | 0.4 | 0.3 | 1.1 | 17.8 | 0.1 | 1.7 | 0.4 | 9.0 | 0.7 | 0.0 | 40.1 | 1.4 | 4.8 | 3.7 | 2.5 | 9.6 | 2.3 | 0.3 | 0.0 |
This shows principal investigators sorted into eight clusters using only two criteria: the win rates by count of proposal and sum of funds requested.
First, principal investigators (PI’s) with three or fewer proposals are filtered out. 1,265 principal investigators (43%) are removed and 1,672 principal investigators (57%) are kept.
Second, two variables are calculated per principal investigator:
Third, the rates are centered and scaled.
Fourth, principle components are extracted.
Fifth, hierarchical clustering using Euclidean distance and Ward’s method is used to produce eight clusters.
The clusters are named, described verbally and visually, and the cluster populations by colleges and major institutions are shown.
CLUSTER 1: ROCK BOTTOM
Principal investigators in Cluster 1 have less than ~20% win rates by
both count and sum.
It has a population of 211 PIs (12.6%) and accounts for only 2.1% of total requested funds won.
CLUSTER 2: DOMINANT
Principal investigators in Cluster 2 have less than ~40% win rates by
both count and sum.
It has the largest share of the population and proposals submitted, as well as a large share of funds awarded.
It has a population of 370 PIs (22.1%) and accounts for 16.6% of total requested funds won.
CLUSTER 3: BIG WHIFF
Principal investigators in Cluster 3 have less than ~30% win rates by
sum and less than ~70% win rate by count.
They have a respectable win rate by count but lose the larger proposals.
Cluster 3 has a population of 190 PIs (11.4%) and accounts for 5.5% of total requested funds won.
CLUSTER 4: MONEY
Principal investigators in Cluster 4 have less than ~60% win rates by
sum and less than ~50% win rates by count.
It has the largest share of funds requested won.
Cluster 4 has a population of 245 PIs (14.7%) and accounts for 25.5% of total requested funds won.
CLUSTER 5: MISSING THE BIG WINS
Principal investigators in Cluster 5 have less than ~50% win rates by
sum and greater than ~50% win rates by count.
They have an excellent win rate by count but like Cluster 3, they lose the larger proposals.
Cluster 5 has a population of 151 PIs (9%) and accounts for a large majority (9.9%) of total requested funds won.
CLUSTER 6: BRONZE MEDAL
Principal investigators in Cluster 6 have greater than ~40% win rates by sum and less than ~65% win rates by count.
Their win rates by sum are in third place behind Clusters 7 and 8. And compared to Clusters 3 and 5, they win the large proposals.
Cluster 6 has a population of 156 PIs (9.3%) and accounts for a large majority (17.1%) of total requested funds won.
CLUSTER 7: SILVER MEDAL
Principal investigators in Cluster 7 have greater than ~50% win rates by
sum and greater than ~60% win rates by count. They have the
second-highest win rates.
Cluster 7 has a population of 165 PIs (9.9%) and accounts for a large majority (14.1%) of total requested funds won.
CLUSTER 8: GOLD MEDAL
Principal investigators in Cluster 8 have greater than ~80% win rates by
sum and greater than ~65% win rates by count. They have the highest win
rates.
Cluster 8 has a population of 184 PIs (11%) and accounts for a large majority (9.3%) of total requested funds won.
cluster | population | pop_perc | count_total | count_perc | win.sum | sum_perc | win.count | win.mean | win.median | loss.sum | loss.count | loss.mean | loss.median | sum.rate | count.rate | color |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 211 | 12.6 | 2,780 | 10 | 106,000,000 | 2 | 250 | 420,000 | 152,000 | 2,384,000,000 | 2,530 | 943,500 | 422,600 | 0.04 | 0.09 | peru |
2 | 370 | 22.1 | 6,890 | 26 | 845,000,000 | 17 | 1,770 | 479,000 | 160,000 | 5,025,000,000 | 5,120 | 981,100 | 419,400 | 0.14 | 0.26 | darkseagreen |
3 | 190 | 11.4 | 2,820 | 11 | 278,000,000 | 6 | 1,350 | 206,000 | 50,000 | 1,721,000,000 | 1,470 | 1,169,100 | 412,800 | 0.14 | 0.48 | darkcyan |
4 | 245 | 14.7 | 4,180 | 16 | 1,297,000,000 | 26 | 1,580 | 821,000 | 250,000 | 2,130,000,000 | 2,600 | 819,400 | 360,300 | 0.38 | 0.38 | slateblue |
5 | 151 | 9.0 | 3,300 | 12 | 505,000,000 | 10 | 2,270 | 223,000 | 38,000 | 1,127,000,000 | 1,030 | 1,090,200 | 295,800 | 0.31 | 0.69 | goldenrod |
6 | 156 | 9.3 | 1,960 | 7 | 870,000,000 | 17 | 990 | 883,000 | 223,000 | 472,000,000 | 970 | 486,200 | 187,200 | 0.65 | 0.50 | indianred |
7 | 165 | 9.9 | 2,880 | 11 | 719,000,000 | 14 | 2,220 | 324,000 | 53,000 | 340,000,000 | 660 | 514,900 | 130,000 | 0.68 | 0.77 | mediumaquamarine |
8 | 184 | 11.0 | 1,870 | 7 | 474,000,000 | 9 | 1,630 | 290,000 | 55,000 | 27,000,000 | 240 | 114,300 | 38,900 | 0.95 | 0.87 | firebrick |
This section is omitted due to the few variables used in clustering. Both variables are relevant to all clusters.
Although this clustering did not include any time variables, the submission patterns by cluster are shown.
259 (15.5)% PI’s appear in multiple colleges or institutions.
Org | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
Arch | 0.0 | 29.4 | 17.6 | 29.4 | 11.8 | 0.0 | 5.9 | 5.9 |
Bus | 6.7 | 6.7 | 6.7 | 6.7 | 0.0 | 13.3 | 0.0 | 60.0 |
CTSI | 12.5 | 8.3 | 20.8 | 12.5 | 4.2 | 20.8 | 16.7 | 4.2 |
CVRTI | 5.6 | 33.3 | 11.1 | 22.2 | 11.1 | 5.6 | 5.6 | 5.6 |
Dent | 11.1 | 44.4 | 22.2 | 0.0 | 0.0 | 11.1 | 11.1 | 0.0 |
Educ | 17.2 | 20.7 | 10.3 | 17.2 | 3.4 | 3.4 | 13.8 | 13.8 |
EGI | 10.5 | 15.8 | 21.1 | 15.8 | 10.5 | 0.0 | 0.0 | 26.3 |
Engr | 23.0 | 39.1 | 9.8 | 11.9 | 5.5 | 5.5 | 3.4 | 1.7 |
FinArt | 0.0 | 22.2 | 22.2 | 0.0 | 0.0 | 22.2 | 0.0 | 33.3 |
Health | 17.9 | 33.9 | 1.8 | 19.6 | 3.6 | 8.9 | 5.4 | 8.9 |
Hum | 6.2 | 25.0 | 6.2 | 12.5 | 0.0 | 6.2 | 18.8 | 25.0 |
Hunt | 7.7 | 28.2 | 13.5 | 16.0 | 11.5 | 5.8 | 10.9 | 6.4 |
ICSE | 0.0 | 30.0 | 30.0 | 20.0 | 10.0 | 0.0 | 0.0 | 10.0 |
Law | 20.0 | 0.0 | 40.0 | 20.0 | 0.0 | 0.0 | 20.0 | 0.0 |
Med | 10.0 | 17.3 | 12.4 | 13.7 | 11.5 | 9.0 | 13.7 | 12.3 |
Nurs | 17.1 | 22.9 | 22.9 | 17.1 | 8.6 | 2.9 | 2.9 | 5.7 |
other | 6.4 | 17.9 | 14.1 | 12.8 | 14.1 | 17.9 | 6.4 | 10.3 |
Pharm | 12.9 | 35.5 | 6.5 | 11.3 | 11.3 | 6.5 | 6.5 | 9.7 |
SCI | 2.7 | 16.2 | 13.5 | 24.3 | 10.8 | 13.5 | 5.4 | 13.5 |
Science | 12.0 | 23.0 | 9.7 | 15.2 | 6.5 | 14.3 | 9.7 | 9.7 |
SocBeh | 16.5 | 24.1 | 8.9 | 21.5 | 5.1 | 8.9 | 3.8 | 11.4 |
SocWrk | 8.7 | 17.4 | 4.3 | 17.4 | 4.3 | 13.0 | 17.4 | 17.4 |
Tran | 0.0 | 50.0 | 0.0 | 0.0 | 0.0 | 0.0 | 50.0 | 0.0 |
Org | Arch | Bus | CTSI | CVRTI | Dent | Educ | EGI | Engr | FinArt | Health | Hum | Hunt | ICSE | Law | Med | Nurs | other | Pharm | SCI | Science | SocBeh | SocWrk | Tran |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.0 | 0.4 | 1.3 | 0.4 | 0.4 | 2.2 | 0.9 | 23.5 | 0.0 | 4.3 | 0.4 | 5.2 | 0.0 | 0.4 | 33.9 | 2.6 | 2.2 | 3.5 | 0.4 | 11.3 | 5.7 | 0.9 | 0.0 |
2 | 1.1 | 0.2 | 0.4 | 1.3 | 0.9 | 1.3 | 0.7 | 20.4 | 0.4 | 4.2 | 0.9 | 9.8 | 0.7 | 0.0 | 30.0 | 1.8 | 3.1 | 4.9 | 1.3 | 11.1 | 4.2 | 0.9 | 0.2 |
3 | 1.3 | 0.4 | 2.2 | 0.9 | 0.9 | 1.3 | 1.8 | 10.1 | 0.9 | 0.4 | 0.4 | 9.3 | 1.3 | 0.9 | 42.7 | 3.5 | 4.8 | 1.8 | 2.2 | 9.3 | 3.1 | 0.4 | 0.0 |
4 | 1.8 | 0.4 | 1.1 | 1.4 | 0.0 | 1.8 | 1.1 | 9.9 | 0.0 | 3.9 | 0.7 | 8.8 | 0.7 | 0.4 | 37.8 | 2.1 | 3.5 | 2.5 | 3.2 | 11.7 | 6.0 | 1.4 | 0.0 |
5 | 1.1 | 0.0 | 0.6 | 1.1 | 0.0 | 0.6 | 1.1 | 7.4 | 0.0 | 1.1 | 0.0 | 10.2 | 0.6 | 0.0 | 51.1 | 1.7 | 6.2 | 4.0 | 2.3 | 8.0 | 2.3 | 0.6 | 0.0 |
6 | 0.0 | 1.1 | 2.9 | 0.6 | 0.6 | 0.6 | 0.0 | 7.4 | 1.1 | 2.9 | 0.6 | 5.1 | 0.0 | 0.0 | 40.0 | 0.6 | 8.0 | 2.3 | 2.9 | 17.7 | 4.0 | 1.7 | 0.0 |
7 | 0.5 | 0.0 | 2.1 | 0.5 | 0.5 | 2.1 | 0.0 | 4.2 | 0.0 | 1.6 | 1.6 | 8.9 | 0.0 | 0.5 | 56.0 | 0.5 | 2.6 | 2.1 | 1.0 | 11.0 | 1.6 | 2.1 | 0.5 |
8 | 0.5 | 4.5 | 0.5 | 0.5 | 0.0 | 2.0 | 2.5 | 2.0 | 1.5 | 2.5 | 2.0 | 5.0 | 0.5 | 0.0 | 48.2 | 1.0 | 4.0 | 3.0 | 2.5 | 10.6 | 4.5 | 2.0 | 0.0 |